The replication crisis has eroded the public’s trust in science. Many famous studies, even published in renowed journals, fail to produce the same results when replicated by other researchers. While this is the outcome of several problems in research, one aspect has gotten critical attention—reproducibility. The term reproducible research refers to studies that contain all materials necessary to reproduce the scientific results by other researchers. This allows other to identify flaws in calculations and improve scientific rigor. In this paper, we show a workflow for reproducible research using the R language and a set of additional packages and tools that simplify a reproducible research procedure.
The scientific database Scopus lists over 73,000 entries for the search term “reproducible research” at the time of writing this document. The importance of making research reproducible was recognized in the early 1950s in multiple research subjects. And with the reproducibility project the Open Science Foundation (Open Science Collaboration and others 2015) found that merely half of all studies conducted in psychological research can be replicated by other researchers. Several factors have contributed to this problem. From a high level perspective, the pressure to publish and the increase in scientific output has lead to a plethora of findings that will not replicate. Both bad research design and (possibly unintentional) bad research practices have increased the amount of papers that hold little to no value. More than half of researchers agree that there is a severe reproducibility crisis in science according to Baker (2016) and her article in Nature. The study also found that problems for reproducibility include: 1) lack of analysis code availability, lack of raw data availabilit, and problems with reproduction efforts.
One problem that is often mentioned is HARKing (Kerr 1998) or “hypothesizing after results are known”. When multiple statistical tests are conducted with a normal alpha-error rate (e.g., \(\alpha = .05\)), it is expected that some tests will reject the null-hypothesis on mere randomness alone. Hence, the error-rate. If researchers now claim that these findings were their initial hypotheses, results will be indiscernible from randomness. However, this is unknown to the reviewer or reader who only hears about the new hypotheses. HARKing produces findings were there are none. It is thus crucial to determine the research hypothesis before collecting (or analyzing) the data.
Another strategy applied (often without ill intent) is p-hacking (Head et al. 2015). This technique is widespread in scientific publications and probably already is shifting consensus in science. p-hacking refers to techniques that alter the data until the desired p-value is reached. Omitting individual outliers, creating different grouping variables, adding or removing control variables—all these techniques can be considered p-hacking. This process also leads to results that will not hold under replication. It is crucial to show what modifications have been performed on data to evaluate the interpretability of p-values.
When researchers already “massage” the data to attain better p-values, it is additionally bad that many researchers do not understand the meaning of p-values. As Colquhoun (2017) found, many research misinterpret p-values and thus frame their findings much stronger than they really are. Adequate reporting of p-values is thus important to the interpretability of results as well.
Lastly, scientific journals have to problem that they are mostly interested in publishing significant results. Thus contradictory “non-findings” seldom get published in renowned journals. There is little “value” for a researcher to publish non-significant findings, as the additional work to write a manuscript for something like arXiv does often not reap the same reward as a journal publication. This so-called publication bias (Simonsohn, Nelson, and Simmons 2014) worsens the crisis. As now only significant findings are available. It is thus necessary to simplify the process of publishing non-significant results.
Many different solutions to this process have been proposed to address these challenges (e.g., (Marwick, Boettiger, and Mullen 2018; Wilson et al. 2017)). However, no uniform process exists that allows creating of documents and alternative reproducibility materials in one workflow.
In this paper, we demonstrate a research workflow based on the R-language and the R Markdown format. This paper was written using this workflow and the sources are freely available online (https://www.osf.io/kcbj5). Our workflow directly addresses the challenge of writing LNCS papers and a companion paper website (https://sumidu.github.io/reproducibleR/) that includes additional material and downloadable data.
In this paper, we will focus on the following aspects:
We assume that the reader is somewhat familiar with the R Programming language and knows that scientific analyses can be run using computational tools such as R, Python, Julia or others. The guidance in this paper adresses the R user.
The Open Science Foundation (OSF) speaks of three different kinds of reproducibility (Meyers 2017). Computational reproducibility refers to the quality of research that when other researchers get access to your code and data that they will be able to reproduce your results. Empirical reproducibility means that your research has sufficient information that allows other researchers to recreate your experiments and copy your study. Replicability refers to the quality of an outcome and a study, meaning that given that you were to reproduce the experiment, you would also reach the same outcome. In this article we provide tools for the first type of reproducibility only, as the latter are both dependent on your research content not exclusively on your procedure. It is import to note that creating computationally reproducible research is important, but it is also worthless when basic concepts of methods and research processes are ignored. If you measure incorrectly, your result may reproduce, but the finding my be wrong anyways. Hopefully, others will be able to point this out to you more easily.
The central aim of a research compendium is to provide all data and information necessary to allow others to reproduce your findings from your data (Gentleman and Temple Lang 2007). There are several different ways of achieving this but a central theme of research compendia is to organize data in a meaningful fashion. Since we are adressing R users, it makes sense to consider possible computing environments for R first.
R is the de-facto standard when it comes to statistical analyses tools that are open source and free to use. In economics and the social sciences similar tools that provide a GUI like SPSS are used with one immediate downside for reproducibility. If your analysis toolkit is proprietary, other users will not be able to reproduce your work without a significant investment. Moreover, using a GUI makes it intraceble—even to yourself—what analyses you have conducted later. You might have manually deleted a row with broken data, or might have recoded a typing error in your data manually. If this is not docume
The most popular integrated development environment (IDE) for R is RStudio. RStudio comes with an license that allows research to freely use it for scientific purposes and it integrates many of the tools described in this paper. The first strong tool for reproducible research using R is using RStudio projects.
RStudio projects contain information about where your code, your data, and your output should reside on your computer. The benefit of RStudio projects is that they contain relative path informations, so when another user installs your project on their computer, it should work without a problem. Since you need to refer to files in some cases, even relative paths the here package provides a helpful tool to access data relative to the project main directory. This works on Linux, Windows and Mac computers.
rmdtemplatesCRAN.
renv
here (???), usethis (Wickham and Bryan 2019), drake (Landau 2020))citr (???), gramr (???), questionr (???), esquisse (???))ggstatsplot (???)DiagrammeR (Iannone 2020)Process diagramms as in Figure 7.1 can easily be created using the DiagrammeR (Iannone 2020) Package.
library(DiagrammeR)
grViz(diagram = "
digraph boxes_and_cicrles {
graph [rankdir = LR]
node [shape = box
fontname = Helvetica
]
'Setup OSF Project Site'
Test
node [shape = circle]
Start
edge []
Start->'Setup OSF Project Site';
'Setup OSF Project Site'->Test;
}
")Figure 7.1: Example
Option 1 sdcMicro
Option 2 anonymizer
On this sub-page you can find the data used as a downloadable file (CSV, Excel, or PDF).
We used the following packages to create this document:
Baker, Monya. 2016. “Reproducibility Crisis?” Nature 533 (26): 353–66.
Barnier, Julien. 2019. Rmdformats: HTML Output Formats and Templates for ’Rmarkdown’ Documents. https://CRAN.R-project.org/package=rmdformats.
Bryan, Jennifer. 2018. “Excuse Me, Do You Have a Moment to Talk About Version Control?” The American Statistician 72 (1). Taylor & Francis: 20–27.
Calero Valdez, André. 2020. Rmdtemplates: Rmdtemplates - an Opinionated Collection of Rmarkdown Templates. https://github.com/statisticsforsocialscience/rmd_templates.
Colquhoun, David. 2017. “The Reproducibility of Research and the Misinterpretation of P-Values.” Royal Society Open Science 4 (12). The Royal Society Publishing: 171085.
Gentleman, Robert, and Duncan Temple Lang. 2007. “Statistical Analyses and Reproducible Research.” Journal of Computational and Graphical Statistics 16 (1). Taylor & Francis: 1–23.
Head, Megan L, Luke Holman, Rob Lanfear, Andrew T Kahn, and Michael D Jennions. 2015. “The Extent and Consequences of P-Hacking in Science.” PLoS Biology 13 (3). Public Library of Science: e1002106.
Iannone, Richard. 2020. DiagrammeR: Graph/Network Visualization. https://CRAN.R-project.org/package=DiagrammeR.
Kerr, Norbert L. 1998. “HARKing: Hypothesizing After the Results Are Known.” Personality and Social Psychology Review 2 (3). Sage Publications Sage CA: Los Angeles, CA: 196–217.
Landau, William Michael. 2020. Drake: A Pipeline Toolkit for Reproducible Computation at Scale. https://CRAN.R-project.org/package=drake.
Marwick, Ben, Carl Boettiger, and Lincoln Mullen. 2018. “Packaging Data Analytical Work Reproducibly Using R (and Friends).” The American Statistician 72 (1). Taylor & Francis: 80–88.
Meyers, Natalie K. 2017. “Reproducible Research and the Open Science Framework.” OSF. osf.io/458u9.
Open Science Collaboration, and others. 2015. “Estimating the Reproducibility of Psychological Science.” Science 349 (6251). American Association for the Advancement of Science: aac4716.
Simonsohn, Uri, Leif D Nelson, and Joseph P Simmons. 2014. “P-Curve and Effect Size: Correcting for Publication Bias Using Only Significant Results.” Perspectives on Psychological Science 9 (6). Sage Publications Sage CA: Los Angeles, CA: 666–81.
Wickham, Hadley. 2019. Tidyverse: Easily Install and Load the ’Tidyverse’. https://CRAN.R-project.org/package=tidyverse.
Wickham, Hadley, and Jennifer Bryan. 2019. Usethis: Automate Package and Project Setup. https://CRAN.R-project.org/package=usethis.
Wickham, Hadley, and Dana Seidel. 2019. Scales: Scale Functions for Visualization. https://CRAN.R-project.org/package=scales.
Wilson, Greg, Jennifer Bryan, Karen Cranston, Justin Kitzes, Lex Nederbragt, and Tracy K Teal. 2017. “Good Enough Practices in Scientific Computing.” PLoS Computational Biology 13 (6). Public Library of Science: e1005510.
Xie, Yihui. 2020. Knitr: A General-Purpose Package for Dynamic Report Generation in R. https://CRAN.R-project.org/package=knitr.